Overview

Dataset statistics

Number of variables10
Number of observations557
Missing cells3
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory41.4 KiB
Average record size in memory76.1 B

Variable types

NUM9
CAT1

Reproduction

Analysis started2020-06-03 11:44:21.559886
Analysis finished2020-06-03 11:48:40.393350
Duration4 minutes and 18.83 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

ocean_proximity has constant value "NEAR BAY" Constant
households is highly correlated with total_bedroomsHigh correlation
total_bedrooms is highly correlated with householdsHigh correlation

Variables

longitude
Real number (ℝ)

Distinct count21
Unique (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-122.23524236983842
Minimum-122.34
Maximum-122.12
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum-122.34
5-th percentile-122.29
Q1-122.27
median-122.25
Q3-122.2
95-th percentile-122.16
Maximum-122.12
Range0.22
Interquartile range (IQR)0.07

Descriptive statistics

Standard deviation0.04335092458
Coefficient of variation (CV)-0.0003546516024
Kurtosis-0.7682353449
Mean-122.2352424
Median Absolute Deviation (MAD)0.03
Skewness0.4382330124
Sum-68085.03
Variance0.001879302662
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-122.276211.1%
 
-122.286010.8%
 
-122.25539.5%
 
-122.26498.8%
 
-122.29366.5%
 
-122.19356.3%
 
-122.23356.3%
 
-122.18325.7%
 
-122.24325.7%
 
-122.21295.2%
 
Other values (11)13424.1%
 
ValueCountFrequency (%) 
-122.3410.2%
 
-122.3310.2%
 
-122.3183.2%
 
-122.29366.5%
 
-122.286010.8%
 
ValueCountFrequency (%) 
-122.1220.4%
 
-122.1350.9%
 
-122.1440.7%
 
-122.1561.1%
 
-122.16193.4%
 

latitude
Real number (ℝ≥0)

Distinct count18
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.81089766606823
Minimum37.73
Maximum37.9
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum37.73
5-th percentile37.75
Q137.77
median37.81
Q337.85
95-th percentile37.89
Maximum37.9
Range0.17
Interquartile range (IQR)0.08

Descriptive statistics

Standard deviation0.04414151472
Coefficient of variation (CV)0.001167428372
Kurtosis-0.9011517507
Mean37.81089767
Median Absolute Deviation (MAD)0.04
Skewness0.2635547511
Sum21060.67
Variance0.001948473322
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
37.77549.7%
 
37.81488.6%
 
37.8488.6%
 
37.79468.3%
 
37.78417.4%
 
37.82356.3%
 
37.85335.9%
 
37.75305.4%
 
37.76295.2%
 
37.83285.0%
 
Other values (8)16529.6%
 
ValueCountFrequency (%) 
37.7371.3%
 
37.74203.6%
 
37.75305.4%
 
37.76295.2%
 
37.77549.7%
 
ValueCountFrequency (%) 
37.9132.3%
 
37.89223.9%
 
37.88264.7%
 
37.87234.1%
 
37.86264.7%
 

housing_median_age
Real number (ℝ≥0)

Distinct count41
Unique (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.81508078994614
Minimum2.0
Maximum52.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum2
5-th percentile26
Q141
median49
Q352
95-th percentile52
Maximum52
Range50
Interquartile range (IQR)11

Descriptive statistics

Standard deviation9.211733341
Coefficient of variation (CV)0.2055498546
Kurtosis2.23408609
Mean44.81508079
Median Absolute Deviation (MAD)3
Skewness-1.536954554
Sum24962
Variance84.85603115
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5223141.5%
 
50234.1%
 
43234.1%
 
46234.1%
 
49234.1%
 
41183.2%
 
44162.9%
 
42162.9%
 
45152.7%
 
39142.5%
 
Other values (31)15527.8%
 
ValueCountFrequency (%) 
210.2%
 
1030.5%
 
1310.2%
 
1410.2%
 
1510.2%
 
ValueCountFrequency (%) 
5223141.5%
 
5161.1%
 
50234.1%
 
49234.1%
 
48122.2%
 

total_rooms
Real number (ℝ≥0)

Distinct count516
Unique (%)92.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1865.5098743267504
Minimum12.0
Maximum12842.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum12
5-th percentile503.8
Q11087
median1668
Q32324
95-th percentile3770.8
Maximum12842
Range12830
Interquartile range (IQR)1237

Descriptive statistics

Standard deviation1156.893331
Coefficient of variation (CV)0.6201485967
Kurtosis16.11583934
Mean1865.509874
Median Absolute Deviation (MAD)630
Skewness2.57008092
Sum1039089
Variance1338402.178
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
88020.4%
 
95420.4%
 
249220.4%
 
225220.4%
 
226920.4%
 
181320.4%
 
223920.4%
 
90920.4%
 
142020.4%
 
231520.4%
 
Other values (506)53796.4%
 
ValueCountFrequency (%) 
1210.2%
 
9610.2%
 
10510.2%
 
13510.2%
 
14210.2%
 
ValueCountFrequency (%) 
1284210.2%
 
735510.2%
 
709910.2%
 
596310.2%
 
587110.2%
 

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count379
Unique (%)68.4%
Missing3
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean409.2184115523466
Minimum4.0
Maximum2477.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum4
5-th percentile124.95
Q1237
median364
Q3480.75
95-th percentile865.75
Maximum2477
Range2473
Interquartile range (IQR)243.75

Descriptive statistics

Standard deviation287.2340961
Coefficient of variation (CV)0.7019090247
Kurtosis13.78927144
Mean409.2184116
Median Absolute Deviation (MAD)125
Skewness2.934365032
Sum226707
Variance82503.42599
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
19561.1%
 
46061.1%
 
29350.9%
 
26340.7%
 
18440.7%
 
26140.7%
 
30040.7%
 
57440.7%
 
33540.7%
 
39140.7%
 
Other values (369)50991.4%
 
ValueCountFrequency (%) 
410.2%
 
2910.2%
 
3120.4%
 
3810.2%
 
4210.2%
 
ValueCountFrequency (%) 
247710.2%
 
240810.2%
 
204810.2%
 
191410.2%
 
175010.2%
 

population
Real number (ℝ≥0)

Distinct count466
Unique (%)83.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean968.6840215439856
Minimum18.0
Maximum4985.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum18
5-th percentile321
Q1582
median863
Q31171
95-th percentile1931.8
Maximum4985
Range4967
Interquartile range (IQR)589

Descriptive statistics

Standard deviation580.3511967
Coefficient of variation (CV)0.5991130067
Kurtosis8.811854949
Mean968.6840215
Median Absolute Deviation (MAD)287
Skewness2.217827901
Sum539557
Variance336807.5115
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
84930.5%
 
49630.5%
 
100530.5%
 
90130.5%
 
50630.5%
 
58230.5%
 
71830.5%
 
108530.5%
 
98730.5%
 
148130.5%
 
Other values (456)52794.6%
 
ValueCountFrequency (%) 
1810.2%
 
8610.2%
 
9410.2%
 
9810.2%
 
12510.2%
 
ValueCountFrequency (%) 
498510.2%
 
436710.2%
 
374110.2%
 
366810.2%
 
346910.2%
 

households
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count376
Unique (%)67.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean385.8563734290844
Minimum7.0
Maximum2323.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum7
5-th percentile120.6
Q1223
median339
Q3460
95-th percentile804.6
Maximum2323
Range2316
Interquartile range (IQR)237

Descriptive statistics

Standard deviation269.9745009
Coefficient of variation (CV)0.6996761476
Kurtosis12.64546671
Mean385.8563734
Median Absolute Deviation (MAD)120
Skewness2.838959545
Sum214922
Variance72886.23113
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25950.9%
 
35550.9%
 
24440.7%
 
33540.7%
 
42240.7%
 
15540.7%
 
35040.7%
 
34040.7%
 
13340.7%
 
25530.5%
 
Other values (366)51692.6%
 
ValueCountFrequency (%) 
710.2%
 
2310.2%
 
3410.2%
 
3910.2%
 
4720.4%
 
ValueCountFrequency (%) 
232310.2%
 
205110.2%
 
196710.2%
 
178910.2%
 
174210.2%
 

median_income
Real number (ℝ≥0)

Distinct count530
Unique (%)95.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.382615798922801
Minimum0.4999
Maximum13.499
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum0.4999
5-th percentile1.18998
Q12.0938
median2.8321
Q34.0469
95-th percentile7.49056
Maximum13.499
Range12.9991
Interquartile range (IQR)1.9531

Descriptive statistics

Standard deviation1.977647742
Coefficient of variation (CV)0.5846504184
Kurtosis3.452196142
Mean3.382615799
Median Absolute Deviation (MAD)0.8983
Skewness1.634949641
Sum1884.117
Variance3.911090592
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.87540.7%
 
3.87530.5%
 
3.22520.4%
 
2.642920.4%
 
2.513920.4%
 
1.902820.4%
 
4.770820.4%
 
1.166720.4%
 
2.53720.4%
 
2.12520.4%
 
Other values (520)53495.9%
 
ValueCountFrequency (%) 
0.499910.2%
 
0.728610.2%
 
0.7510.2%
 
0.7610.2%
 
0.768310.2%
 
ValueCountFrequency (%) 
13.49910.2%
 
12.380410.2%
 
12.213810.2%
 
11.860310.2%
 
11.601710.2%
 

median_house_value
Real number (ℝ≥0)

Distinct count455
Unique (%)81.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean198326.22980251347
Minimum60000.0
Maximum500001.0
Zeros0
Zeros (%)0.0%
Memory size4.4 KiB

Quantile statistics

Minimum60000
5-th percentile82080
Q1112800
median169300
Q3259600
95-th percentile391240
Maximum500001
Range440001
Interquartile range (IQR)146800

Descriptive statistics

Standard deviation103369.5674
Coefficient of variation (CV)0.5212097639
Kurtosis0.235843744
Mean198326.2298
Median Absolute Deviation (MAD)66500
Skewness0.9526447519
Sum110467710
Variance1.068526747e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
500001101.8%
 
13750071.3%
 
11250061.1%
 
16250050.9%
 
12500050.9%
 
15000040.7%
 
35000040.7%
 
9820030.5%
 
14060030.5%
 
21670030.5%
 
Other values (445)50791.0%
 
ValueCountFrequency (%) 
6000010.2%
 
6750010.2%
 
7000010.2%
 
7130010.2%
 
7200010.2%
 
ValueCountFrequency (%) 
500001101.8%
 
48960010.2%
 
48330010.2%
 
47160010.2%
 
46610010.2%
 

ocean_proximity
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
NEAR BAY
557
ValueCountFrequency (%) 
NEAR BAY557100.0%
 

Length

Max length8
Median length8
Mean length8
Min length8

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
5-122.2537.8552.0919.0213.0413.0193.04.0368269700.0NEAR BAY
6-122.2537.8452.02535.0489.01094.0514.03.6591299200.0NEAR BAY
7-122.2537.8452.03104.0687.01157.0647.03.1200241400.0NEAR BAY
8-122.2637.8442.02555.0665.01206.0595.02.0804226700.0NEAR BAY
9-122.2537.8452.03549.0707.01551.0714.03.6912261100.0NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
547-122.2737.7752.01710.0481.0849.0457.02.7115220800.0NEAR BAY
548-122.2737.7752.02388.0559.01121.0518.03.3269234500.0NEAR BAY
549-122.2537.7752.02156.0458.0872.0445.03.2685254200.0NEAR BAY
550-122.2637.7752.01565.0315.0637.0297.04.7778351800.0NEAR BAY
551-122.2637.7752.01704.0371.0663.0340.04.2260275000.0NEAR BAY
552-122.2637.7752.01210.0168.0411.0172.03.3571405400.0NEAR BAY
553-122.2637.7752.02097.0444.0915.0413.02.9899228100.0NEAR BAY
554-122.2637.7752.01848.0479.0921.0477.02.8750234000.0NEAR BAY
555-122.2437.7743.0955.0284.0585.0266.02.3882162500.0NEAR BAY
556-122.2537.7743.04329.01110.02086.01053.02.9750243400.0NEAR BAY